Computing Runs on a General Alphabet
نویسنده
چکیده
We describe a RAM algorithm computing all runs (=maximal repetitions) of a given string of length n over a general ordered alphabet in O(n log 2 3 n) time and linear space. Our algorithm outperforms all known solutions working in Θ(n log σ) time provided σ = n, where σ is the number of distinct letters in the input string. We conjecture that there exists a linear time RAM algorithm finding all runs.
منابع مشابه
On Runs in Independent Sequences
Given an i.i.d. sequence of n letters from a finite alphabet, we consider the length of the longest run of any letter. In the equiprobable case, results for this run turn out to be closely related to the well-known results for the longest run of a given letter. For coin-tossing, tail probabilities are compared for both kinds of runs via Poisson approximation.
متن کاملNear-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries
Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et. al. (SODA 2015) showed a link between the two notions: all the r...
متن کاملA Further Note on Runs in Independent Sequences
Given a sequence of letters generated independently from a finite alphabet, we consider the case when more than one, but not all, letters are generated with the highest probability. The length of the longest run of any of these letters is shown to be one greater than the length of the longest run in a particular state of an associated Markov chain. Using results of Foulser and Karlin (19...
متن کاملNew Algorithms for the Longest Common Subsequence Problem New Algorithms for the Longest Common Subsequence Problem New Algorithms for the Longest Common Subsequence Problem
Given two sequences A = a 1 a 2 : : :a m and B = b 1 b 2 : : :b n , m n, over some alphabet , a common subsequence C = c 1 c 2 : : :c l of A and B is a sequence that can be obtained from both A and B by deleting zero or more (not necessarily adjacent) symbols. Finding a common subsequence of maximallength is called the Longest CommonSubsequence (LCS) Problem. Two new algorithms based on the wel...
متن کاملFaster Longest Common Extension Queries in Strings over General Alphabets
Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of q LCE queries for a string of size n over a general ordered alphabet can be realized in O(q log log n + n log n) time making only O(q + n) symbol comparisons. C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Inf. Process. Lett.
دوره 116 شماره
صفحات -
تاریخ انتشار 2016